1 Overview

We are strong supporters of open-source tools for reproducible research. Priority index or Pi is developed as a genomic-led target prioritisation system, with the focus on leveraging genetic data to prioritise targets at the gene and pathway level. Based on evidence of genetic association from GWAS data, this prioritisation system is able to resolve modulated genes (seed genes) by utilising knowledge of linkage disequilibrium (co-inherited variants), distance from the gene, and evidence of genetic association with gene expression. Seed genes are scored in an integrative way quantifying the genetic influence. Scored seed genes are subsequently used as baits to rank seed genes plus additional (non-seed) genes; this is achieved by iteratively exploring the global connectivity of a gene interaction network. Genes with the top priority are further used to identify/prioritise pathways that are significantly enriched with highly prioritised genes. Prioritised genes are also used to identify a gene network interconnecting many highly prioritised genes but a few lowly prioritised genes as linkers.

In the Showcases section, we illustrate the power of Pi to perform disease-centric genetic target prioritisation at the gene, pathway and network level using several inflammatory diseases as exemplars, including Ankylosing Spondylitis, Spondyloarthritis, and Systemic Lupus Erythematosus.

In the Ontology annotations & TPN nominations section, we integrate ontology annotations and TPN nominations as an additional support for disease-specific genetic prioritisation by Pi.

In the Cross-disease comparisons section, we summarise and compare results of genetic target prioritisation across diseases, done at the gene, pathway and network level.

2 Web app

A web-based interface of Pi is available here, which allows on-the-fly prioritisation for a user-defined SNP list.

3 Workflow

4 R functions

This intends for end-users who are comfortable with R (with the latest version available at http://cran.r-project.org). Priority functions are all contained in a new package called Pi which can be installed following instructions below:

# install the dependant packages including `XGR` and other packages used here
source("http://bioconductor.org/biocLite.R")
biocLite(c("XGR","devtools","VennDiagram"), siteRepos=c("http://cran.r-project.org"))
library(devtools)
install_github(c("hfang-bristol/XGR", "hfang-bristol/dnet"), dependencies=T)

# also, install the `Pi` package itself
library(devtools)
install_github(c("hfang-bristol/Pi"), dependencies=T)

Priority functions are designed in a nested way. The core relation follows this route: xPrioritiserSNPs -> xPrioritiserGenes -> xPrioritiser -> xRWR, achieving gene-level prioritisation from an input list of SNPs (along with their significant level). The output of this route is taken as the input of either xPrioritiserManhattan for visualising gene priority scores, xPrioritiserPathways for prioritising pathways, or xPrioritiserSubnet for identifying a network of top prioritised genes.

4.1 xRWR

xRWR: implements Random Walk with Restart (RWR) estimating the affinity score of nodes in a graph to a list of seed nodes. The affinity score can be viewed as the influential impact over the graph that is collectively imposed by seed nodes. If the seeds are not given, it will pre-compute affinity matrix for nodes in the input graph with respect to each starting node (as a seed) by looping over every node in the graph. A higher-level function xPrioritiser directly relies on it.

4.2 xPrioritiser

xPrioritiser: uses RWR to calculate the affinity score of nodes in a graph to a list of seed nodes. A node that has a higher affinity score to seed nodes will receive a higher priority score. It is an internal function acting as a general template for RWR-based prioritisation. A higher-level function xPrioritiserGenes directly relies on it.

4.3 xPrioritiserGenes

xPrioritiserGenes: prioritises gene targets from an input gene interaction network and the score info imposed on its seed nodes/genes. This function can be considered to be a specific instance of xPrioritiser, that is, specifying a gene interaction network as a graph and seed nodes as seed genes.

There are two sources of gene interaction network information: the STRING database (Szklarczyk et al. 2015) and the Pathways Commons database (Cerami et al. 2011). STRING is a meta-integration of undirect interactions from a functional aspect, while Pathways Commons mainly contains both undirect and direct interactions from a physical/pathway aspect. In addition to interaction type, users can choose the interactions of varying quality:

Code Interaction (type and quality) Database
STRING_high Functional interactions (with high confidence scores>=700) STRING
STRING_medium Functional interactions (with medium confidence scores>=400) STRING
PCommonsUN_high Physical/undirect interactions (with references & >=2 sources) Pathways Commons
PCommonsUN_medium Physical/undirect interactions (with references & >=1 sources) Pathways Commons
PCommonsDN_high Pathway/direct interactions (with references & >=2 sources) Pathways Commons
PCommonsDN_medium Pathway/direct interactions (with references & >=1 sources) Pathways Commons

4.4 xPrioritiserSNPs

xPrioritiserSNPs: prioritises gene targets from an input gene interaction network and a given list of SNPs together with the significance level (eg GWAS reported p-values). To do so, it first defines seed genes and their scores that are calculated in an integrative manner to quantify the genetic influence under SNPs. Genetic influential score on a seed gene is calculated from the SNP score (reflecting the SNP significant level), the gene-to-SNP distance weight and the eQTL mapping weight. This function can be considered to be a specific instance of xPrioritiserGenes, that is, specifying seed genes plus their scores.

Knowledge of co-inherited variants is also used to include additional SNPs that are in Linkage Disequilibrium (LD) with GWAS lead SNPs. LD SNPs are calculated based on 1000 Genomes Project data (1000 Genomes Project Consortium 2012). LD SNPs are defined to be any SNPs having R2 > 0.8 with GWAS lead SNPs. The user can choose the population used to calculate LD SNPs:

Code Population Project
AFR African 1000 Genomes Project
AMR Admixed American 1000 Genomes Project
EAS East Asian 1000 Genomes Project
EUR European 1000 Genomes Project
SAS South Asian 1000 Genomes Project

4.5 xPrioritiserManhattan

xPrioritiserManhattan: visualises prioritised genes using manhattan plot, in which priority for genes is displayed on the Y-axis along with genomic locations on the X-axis. Also highlighted are genes with the top priority.

4.6 xPrioritiserPathways

xPrioritiserPathways: prioritises pathways based on enrichment analysis of genes with the top priority (eg top 100 genes) using a compendium of pathways. A highly prioritised pathway has its member genes with high gene-level priority scores, that is, having evidence of direct modulation by disease-associated lead (or LD) SNPs, and/or having evidence of indirect modulation at the network level.

In addition to pathways, enrichment analysis can be extended to other ontologies, as listed below:

Code Ontology Category
DO Disease Ontology Disease
GOMF Gene Ontology Molecular Function Function
GOBP Gene Ontology Biological Process Function
GOCC Gene Ontology Cellular Component Function
HPPA Human Phenotype Phenotypic Abnormality Phenotype
HPMI Human Phenotype Mode of Inheritance Phenotype
HPCM Human Phenotype Clinical Modifier Phenotype
HPMA Human Phenotype Mortality Aging Phenotype
MP Mammalian/Mouse Phenotype Phenotype
DGIdb DGI druggable gene categories Druggable
SF SCOP domain superfamilies Domain
PS2 phylostratific age information (our ancestors) Evolution
MsigdbH Hallmark gene sets Hallmark (MsigDB)
MsigdbC1 Chromosome and cytogenetic band positional gene sets Cytogenetics (MsigDB)
MsigdbC2CGP Chemical and genetic perturbation gene sets Perturbation (MsigDB)
MsigdbC2CPall All pathway gene sets Pathways (MsigDB)
MsigdbC2CP Canonical pathway gene sets Pathways (MsigDB)
MsigdbC2KEGG KEGG pathway gene sets Pathways (MsigDB)
MsigdbC2REACTOME Reactome pathway gene sets Pathways (MsigDB)
MsigdbC2BIOCARTA BioCarta pathway gene sets Pathways (MsigDB)
MsigdbC3TFT Transcription factor target gene sets TF targets (MsigDB)
MsigdbC3MIR microRNA target gene sets microRNA targets (MsigDB)
MsigdbC4CGN Cancer gene neighborhood gene sets Cancer (MsigDB)
MsigdbC4CM Cancer module gene sets Cancer (MsigDB)
MsigdbC5BP GO biological process gene sets Function (MsigDB)
MsigdbC5MF GO molecular function gene sets Function (MsigDB)
MsigdbC5CC GO cellular component gene sets Function (MsigDB)
MsigdbC6 Oncogenic signature gene sets Oncology (MsigDB)
MsigdbC7 Immunologic signature gene sets Immunology (MsigDB)

4.7 xPrioritiserSubnet

xPrioritiserSubnet: identifies a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. An iterative procedure of choosing different priority cutoff is also used to identify the network with a desired number of nodes/genes.

5 Showcases

In this section, we give a step-by-step demo of using Pi to carry out disease-specific genetic prioritisation of targets at the gene and pathway level, focusing on three inflammatory diseases as exemplars: Ankylosing Spondylitis (AS), Spondyloarthritis (SA, including AS and Psoriatic Arthritis), and Systemic Lupus Erythematosus (SLE).

First of all, load the packages including Pi:

library(Pi)
# Following packages are also needed
library(XGR)
library(RCircos)
library(VennDiagram)

# optionally, specify the local location of built-in RData
# by default:
RData.location <- "https://github.com/hfang-bristol/RDataCentre/blob/master/Portal"

5.1 Ankylosing Spondylitis

GWAS SNPs are collected mainly from GWAS Catalog (Welter et al. 2014), complemented by ImmunoBase and latest publications.

Import AS-associated GWAS lead SNPs (with the help of Anna Sanniti)

data.file <- "http://galahad.well.ox.ac.uk/bigdata/AS.txt"
AS <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

The first 5 rows of the data frame AS are shown below, with the column SNP for AS GWAS SNPs and the column Pval for GWAS-detected P-values.

SNP Pval
rs10045403 5.8e-14
rs1018326 2.0e-06
rs10440635 3.0e-07
rs10781500 1.0e-06
rs10865331 1.9e-19

5.1.1 Gene-level prioritisation

It includes the following steps:

  1. Define seed genes based on distance-to-SNP window and genetic association with gene expression: nearby genes that are located within 200kb distance window of lead or Linkage Disequilibrium (LD) SNPs that are calculated based on European population data from 1000 Genome Project; expression associated genes (eQTL genes) for which gene expression is, either in a cis- or trans-acting manner, significantly associated with lead or LD SNPs, based on immune-stimulated eQTL data in monocytes (Fairfax et al. 2014).

  2. Score seed genes to quantify the genetic influence under lead or LD SNPs.

  3. Prioritise target genes by estimating their global network connectivity to seed genes. With scored seed genes superposed onto a gene interaction network (see above STRING_high), RWR algorithm is implemented to prioritise candidate target genes based on their network connectivity/affinity to seed genes. As such, a gene that has a higher affinity score to seed genes will receive a higher priority score.

pAS <- xPrioritiserSNPs(data=AS, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pAS$priority, which can be saved into a file AS.genes_priority.txt:

AS_genes <- pAS$priority
write.table(AS_genes, file="AS.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
CAST 1 52.13154 0.07979 1
ERAP2 1 22.33534 0.03431 2
ERAP1 1 20.02777 0.03099 3
B3GNT2 1 12.76706 0.01958 4
IL23R 1 11.19898 0.01719 5
LNPEP 1 10.36132 0.01603 6
DDX39B 1 7.91010 0.01211 7
IL12RB2 1 7.32161 0.01129 8
HLA-C 1 6.90201 0.01127 9
ETS2 1 7.14037 0.01094 10
PSMG1 1 6.69782 0.01029 11
TBKBP1 1 6.58648 0.01010 12
NFKBIL1 1 6.00768 0.00979 13
ATP6V1G2 1 6.32899 0.00969 14
KIF21B 1 5.82870 0.00950 15
GPR65 1 5.97141 0.00914 16
C1orf106 1 5.21012 0.00806 17
IL6R 1 4.57166 0.00705 18
TNFRSF1A 1 4.24279 0.00685 19
LTA 1 3.77678 0.00642 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_AS <- xPrioritiserManhattan(pAS, highlight.top=20, cex=0.5, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_AS)

5.1.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `AS.pathways_priority.txt`
AS_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(AS_pathways), AS_pathways)
write.table(output, file="AS.pathways_priority.txt", sep="\t", row.names=F)

The top 5 prioritised pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
IL27-mediated signaling events 26 6 29.80 13.00 7.2e-10 1 TNF, IL12B, TYK2, IL12RB2, TBX21, IL27
Cytokine-cytokine receptor interaction 266 14 6.79 8.47 1.2e-09 3 CSF2RB, TNF, TNFRSF1A, IL12B, LTA, IL6R, IL12RB2, IL2RA, CXCR2, LTB, LTBR, IL23R, IL1R2, CD27
IL23-mediated signaling events 37 6 20.90 10.70 1.1e-08 1 NFKB1, TNF, IL12B, TYK2, NOS2, IL23R
IL12-mediated signaling events 63 7 14.30 9.39 2.4e-08 1 NFKB1, IL12B, TYK2, IL12RB2, IL2RA, NOS2, TBX21
Cytokines can induce activation of matrix metalloproteinases, which degrade extracellular matrix. 15 4 34.40 11.40 6.8e-08 1 TNF, TNFRSF1A, IL6R, FCGR3A

This barplot displays the top 5 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- AS_pathways[1:5,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="SF", RData.location=RData.location)
AS_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 8 significant SCOP domains are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Leukotriene A4 hydrolase N-terminal domain 12 4 59.90 15.30 3.6e-09 142 LNPEP, ERAP2, NPEPPS, ERAP1
TNF receptor-like 24 3 22.50 7.87 8.6e-06 298 TNFRSF1A, CD27, LTBR
Immunoglobulin 481 11 4.11 5.20 1.2e-05 73 HLA-C, NCR3, TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL6R, IL12B
p53-like transcription factors 43 3 12.50 5.67 9.2e-05 88 NFKB1, RUNX3, TBX21
TNF-like 53 3 10.20 5.00 2.1e-04 91 LTB, TNF, LTA
Metalloproteases (zincins), catalytic domain 100 4 7.19 4.65 2.3e-04 268 LNPEP, ERAP2, NPEPPS, ERAP1
Fibronectin type III 200 5 4.49 3.72 8.5e-04 77 IL6R, IL12B, IL12RB2, CSF2RB, IL23R
DEATH domain 87 3 6.20 3.64 1.4e-03 68 NFKB1, CARD9, TNFRSF1A

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="GOMF", RData.location=RData.location)
AS_gomf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Leukotriene A4 hydrolase N-terminal domain 12 4 59.90 15.300 3.6e-09 142 LNPEP, ERAP2, NPEPPS, ERAP1
TNF receptor-like 24 3 22.50 7.870 8.6e-06 298 TNFRSF1A, CD27, LTBR
Immunoglobulin 481 11 4.11 5.200 1.2e-05 73 HLA-C, NCR3, TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL6R, IL12B
p53-like transcription factors 43 3 12.50 5.670 9.2e-05 88 NFKB1, RUNX3, TBX21
TNF-like 53 3 10.20 5.000 2.1e-04 91 LTB, TNF, LTA
Metalloproteases (zincins), catalytic domain 100 4 7.19 4.650 2.3e-04 268 LNPEP, ERAP2, NPEPPS, ERAP1
Fibronectin type III 200 5 4.49 3.720 8.5e-04 77 IL6R, IL12B, IL12RB2, CSF2RB, IL23R
DEATH domain 87 3 6.20 3.640 1.4e-03 68 NFKB1, CARD9, TNFRSF1A
Homeodomain-like 294 3 1.83 1.080 8.1e-02 51 NKX2-3, MIER1, SNAPC4
PH domain-like 420 3 1.28 0.442 2.1e-01 114 TYK2, PLEKHG6, SH2B3

5.1.3 Network-level prioritisation

Network-level prioritisation is to identify a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. Given a gene network (the same as used in gene-level prioritisation) with nodes labelled with gene priority scores, the search for a maximum-scoring gene subnetwork is briefed as follows:

  1. score transformation, that is, given a threshold of tolerable priority scores, nodes above this threshold (nodes of interest) are scored positively, and negative scores for nodes below the threshold (intolerable).

  2. subnetwork identification, that is, to find an interconnected gene subnetwork enriched with positive-score nodes but allowing for a few negative-score nodes as linkers, done via heuristically solving prize-collecting Steiner tree problem (Fang and Gough 2014).

  3. controlling the subnetwork size, that is, an iterative procedure of scanning different priority thresholds is used to identify the network with a desired number of nodes/genes.

Notably, the preferential use of the same network as used in gene-level prioritisation is due to the fact that gene-level affinity/priority scores are smoothly distributed over the network after being walked. In other words, the chance of identifying such a gene network enriched with top prioritised genes is much higher. To reduce the runtime, by default only top 10% prioritised genes will be used to search for the maximum-scoring gene network.

# find maximum-scoring gene network with the desired node number=50
AS_net <- xPrioritiserSubnet(pNode=pAS, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- AS_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
IL12-mediated signaling events 63 10 29.00 16.50 1.9e-14 1 NFKB1, JAK2, IL12B, SOCS1, TYK2, IL12RB2, IL1R1, IL2RA, NOS2, TBX21
IL23-mediated signaling events 37 8 39.50 17.40 2.2e-13 1 NFKB1, PIK3CA, JAK2, TNF, IL12B, TYK2, NOS2, IL23R
Cytokine-cytokine receptor interaction 266 15 10.30 11.40 3.1e-13 3 CSF2RB, TNF, TNFRSF1A, IL12B, LTA, IL6R, IL12RB2, IL1R1, IL2RA, CXCR1, CXCR2, LTB, LTBR, IL23R, IL1R2
IL27-mediated signaling events 26 7 49.20 18.30 6.3e-13 1 JAK2, TNF, IL12B, TYK2, IL12RB2, TBX21, IL27
NO2-dependent IL 12 Pathway in NK cells 17 5 53.70 16.10 2.3e-10 2 JAK2, IL12B, TYK2, IL12RB2, NOS2
Jak-STAT signaling pathway 155 10 11.80 10.10 4.7e-10 3 PIK3CA, CSF2RB, JAK2, IL12B, SOCS1, IL6R, TYK2, IL12RB2, IL2RA, IL23R
Cytokine Signaling in Immune system 268 12 8.18 8.86 1.1e-09 4 PIK3CA, CSF2RB, JAK2, ICAM1, SOCS1, IL6R, TYK2, IL1R1, IL2RA, HLA-C, IL1R2, UBA52
Leishmania infection 72 7 17.80 10.60 4.0e-09 3 NFKB1, JAK2, TNF, IL12B, NOS2, FCGR2A, FCGR3A
Signaling by Interleukins 106 8 13.80 9.83 4.4e-09 4 PIK3CA, CSF2RB, JAK2, IL6R, TYK2, IL1R1, IL2RA, IL1R2
Cytokines can induce activation of matrix metalloproteinases, which degrade extracellular matrix. 15 4 48.70 13.70 1.1e-08 1 TNF, TNFRSF1A, IL6R, FCGR3A

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Cell surface 467 8 3.78 4.21 0.00014 NA TNF, TNFRSF1A, RPS6KB1, SCNN1A, ICAM1, FCGR3A, IL1R1, CXCR2
Drug resistance 349 6 3.79 3.62 0.00073 NA TNF, RPS6KB1, ICAM1, SOCS1, PDE4A, LTA
Neutral zinc metallopeptidase 182 4 4.85 3.56 0.00120 NA ERAP2, LNPEP, NPEPPS, ERAP1
External side of plasma membrane 189 4 4.67 3.46 0.00140 NA TNF, SCNN1A, ICAM1, FCGR3A
Tyrosine kinase 151 3 4.38 2.84 0.00450 NA JAK2, TYK2, PTK2

5.2 Spondyloarthritis

Import SA-associated GWAS lead SNPs (with the help of Anna Sanniti and Alicia Lledo Lara)

data.file <- "http://galahad.well.ox.ac.uk/bigdata/Spondyloarthritis.txt"
SA <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

5.2.1 Gene-level prioritisation

pSA <- xPrioritiserSNPs(data=SA, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pSA$priority, which can be saved into a file SA.genes_priority.txt:

SA_genes <- pSA$priority
write.table(SA_genes, file="SA.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
HLA-C 1 112.08485 0.04467 1
MUC22 1 67.25454 0.02712 2
CAST 1 52.13154 0.02063 3
CCHCR1 1 45.24139 0.01795 4
DDX39B 1 43.38974 0.01717 5
MUC21 1 35.96291 0.01523 6
ATF6B 1 37.30371 0.01477 7
NFKBIL1 1 33.00396 0.01391 8
ATP6V1G2 1 34.76913 0.01376 9
POU5F1 1 32.16604 0.01277 10
ERAP2 1 22.33534 0.00930 11
IER3 1 22.88007 0.00904 12
LTA 1 20.74823 0.00883 13
ERAP1 1 20.02777 0.00838 14
TRAF3IP2 1 20.03787 0.00793 15
TNF 1 19.24358 0.00770 16
MDC1 1 19.39800 0.00768 17
LTB 1 17.17716 0.00752 18
TNIP1 1 18.67828 0.00745 19
ANXA6 1 17.98198 0.00713 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_SA <- xPrioritiserManhattan(pSA, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SA)

5.2.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SA.pathways_priority.txt`
SA_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SA_pathways), SA_pathways)
write.table(output, file="SA.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Type I diabetes mellitus 43 7 23.8 12.40 3.6e-10 3 TNF, IL12B, LTA, HLA-DQB1, HLA-C, HLA-E, HLA-DQA1
Allograft rejection 37 6 23.7 11.50 4.3e-09 3 TNF, IL12B, HLA-DQB1, HLA-C, HLA-E, HLA-DQA1
IL23-mediated signaling events 37 6 23.7 11.50 4.3e-09 1 TNF, IL12B, TYK2, NOS2, IL23A, IL23R
IL27-mediated signaling events 26 5 28.1 11.50 1.6e-08 1 STAT2, TNF, IL12B, TYK2, IL12RB2
NO2-dependent IL 12 Pathway in NK cells 17 4 34.4 11.40 7.3e-08 2 IL12B, TYK2, IL12RB2, NOS2
Graft-versus-host disease 41 5 17.8 8.96 3.0e-07 3 TNF, HLA-DQB1, HLA-C, HLA-E, HLA-DQA1
Viral myocarditis 71 6 12.4 7.97 4.6e-07 3 FYN, ICAM1, HLA-DQB1, HLA-C, HLA-E, HLA-DQA1
Leishmania infection 72 5 10.2 6.47 8.7e-06 3 TNF, IL12B, NOS2, HLA-DQB1, HLA-DQA1
Cytokine Network 21 3 20.9 7.57 1.1e-05 2 TNF, IL12B, LTA
IL12 and Stat4 Dependent Signaling Pathway in Th1 Development 23 3 19.1 7.20 1.6e-05 2 IL12B, TYK2, IL12RB2

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- SA_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="SF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 6 significant SCOP domains are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Leukotriene A4 hydrolase N-terminal domain 12 3 53.00 12.40 2.2e-07 142 LNPEP, ERAP2, ERAP1
MHC antigen-recognition domain 40 4 21.20 8.81 1.1e-06 235 HLA-C, HLA-DQA1, HLA-DQB1, HLA-E
Immunoglobulin 481 10 4.41 5.24 1.3e-05 73 HLA-C, HLA-DQA1, HLA-DQB1, HLA-E, NCR3, ICAM1, ICAM3, ICAM4, ICAM5, IL12B
TNF-like 53 3 12.00 5.53 1.1e-04 91 LTB, TNF, LTA
Metalloproteases (zincins), catalytic domain 100 3 6.36 3.71 1.3e-03 268 LNPEP, ERAP2, ERAP1
SH2 domain 112 3 5.68 3.42 1.9e-03 269 TYK2, STAT2, FYN

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="GOMF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap fc zscore pvalue distance members
peptide antigen binding 36 4 21.90 8.97 9.7e-07 3 HLA-E, HLA-DQB1, HLA-C, HLA-DQA1
aminopeptidase activity 28 3 21.10 7.61 1.1e-05 6 ERAP2, LNPEP, ERAP1
tumor necrosis factor receptor binding 30 3 19.70 7.33 1.5e-05 6 TNF, LTA, LTB
integrin binding 105 4 7.52 4.78 1.9e-04 4 ICAM1, ICAM3, ICAM4, ICAM5
cytokine activity 172 5 5.74 4.46 2.3e-04 4 TNF, IL12B, LTA, LTB, IL23A
receptor binding 321 7 4.30 4.27 2.1e-04 3 HLA-E, HLA-C, ICAM3, LTA, LTB, NOS2, APOF
ATP-dependent RNA helicase activity 63 3 9.40 4.77 2.9e-04 9 DHX30, SKIV2L, DDX39B
transcription regulatory region DNA binding 182 4 4.34 3.23 2.3e-03 6 ATF6B, POU5F1, TNF, XRCC6
sequence-specific DNA binding transcription factor activity 825 8 1.91 1.93 2.3e-02 2 ZNF668, ZGLP1, ETS2, GTF2H4, ATF6B, REL, STAT2, POU5F1
signal transducer activity 227 3 2.61 1.74 2.8e-02 2 STAT2, SLC44A2, KCNH7

5.2.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=50
SA_net <- xPrioritiserSubnet(pNode=pSA, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- SA_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
IL23-mediated signaling events 37 6 32.30 13.60 4.5e-10 1 TNF, IL12B, TYK2, NOS2, IL23A, IL23R
Monocyte and its Surface Molecules 11 4 72.50 16.80 1.1e-09 2 ITGB1, ICAM1, ITGAL, ITGAM
IL27-mediated signaling events 26 5 38.30 13.50 2.4e-09 1 STAT2, TNF, IL12B, TYK2, IL12RB2
Adhesion and Diapedesis of Granulocytes 14 4 56.90 14.90 4.9e-09 2 ICAM1, ITGAL, TNF, ITGAM
Beta2 integrin cell surface interactions 29 5 34.40 12.80 4.9e-09 1 ICAM1, ITGAL, ITGAM, ICAM3, ICAM4
Cytokine-cytokine receptor interaction 266 11 8.24 8.52 4.2e-09 3 TNF, TNFRSF1A, IL12B, LTA, IL6R, IL12RB2, CTF1, IL23A, LTB, LTBR, IL23R
Cells and Molecules involved in local acute inflammatory response 17 4 46.90 13.50 1.5e-08 2 ITGB1, ICAM1, ITGAL, TNF
NO2-dependent IL 12 Pathway in NK cells 17 4 46.90 13.50 1.5e-08 2 IL12B, TYK2, IL12RB2, NOS2
Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell 65 6 18.40 10.00 2.7e-08 4 ITGB1, ICAM1, ITGAL, HLA-C, ICAM3, ICAM4
Jak-STAT signaling pathway 155 8 10.30 8.28 5.6e-08 3 STAT2, IL12B, IL6R, TYK2, IL12RB2, CTF1, IL23A, IL23R

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Drug resistance 349 7 4.59 4.57 8.3e-05 NA TNF, ITGB1, ICAM1, TP53, FYN, LTA, GPX3
Tumor suppressor 716 9 2.87 3.54 5.5e-04 NA TNF, ITGB1, TNFAIP3, TP53, UBA52, MDC1, IL12B, CSNK2A1, CDC37
External side of plasma membrane 189 4 4.84 3.55 1.2e-03 NA TNF, ITGAL, ITGB1, ICAM1
Cell surface 467 6 2.94 2.89 3.2e-03 NA TNF, TNFRSF1A, STX4, ITGAL, ITGB1, ICAM1
Dna repair 385 5 2.97 2.65 5.5e-03 NA TP53, XRCC6, UBA52, SSRP1, MDC1

5.3 Systemic Lupus Erythematosus

Import SLE-associated GWAS lead SNPs

data.file <- "http://galahad.well.ox.ac.uk/bigdata/SLE.txt"
SLE <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

5.3.1 Gene-level prioritisation

pSLE <- xPrioritiserSNPs(data=SLE, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pSLE$priority, which can be saved into a file SLE.genes_priority.txt:

SLE_genes <- pSLE$priority
write.table(SLE_genes, file="SLE.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
ATF6B 1 149.00804 0.02673 1
CLIC1 1 128.32977 0.02307 2
SKIV2L 1 116.73809 0.02112 3
FKBPL 1 105.93050 0.01897 4
BAG6 1 94.67381 0.01830 5
CFB 1 87.25338 0.01567 6
PRRC2A 1 77.76345 0.01492 7
AIF1 1 78.46080 0.01464 8
NELFE 1 79.37208 0.01448 9
HSPA1L 1 80.01154 0.01440 10
VARS 1 80.17340 0.01438 11
LSM2 1 80.17340 0.01438 12
HSPA1A 1 79.01829 0.01422 13
NEU1 1 79.11303 0.01420 14
CSNK2B 1 77.76345 0.01394 15
HSPA1B 1 76.94512 0.01385 16
EHMT2 1 75.18386 0.01350 17
PPT2 1 75.07368 0.01347 18
DDAH2 1 75.23853 0.01347 19
LTA 1 69.71086 0.01320 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_SLE <- xPrioritiserManhattan(pSLE, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SLE)

5.3.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SLE.pathways_priority.txt`
SLE_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SLE_pathways), SLE_pathways)
write.table(output, file="SLE.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Leishmania infection 72 10 17.90 12.70 5.5e-12 3 HLA-DRA, STAT1, TNF, NCF2, ITGAM, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Asthma 30 7 30.10 14.10 4.4e-11 3 HLA-DRA, TNF, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Type I diabetes mellitus 43 8 24.00 13.40 2.7e-11 3 HLA-DRA, TNF, LTA, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Antigen processing and presentation 88 10 14.70 11.40 5.3e-11 3 HLA-DRA, LTA, HSPA1A, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1, HSPA1B, HSPA1L
Allograft rejection 37 7 24.40 12.60 2.8e-10 3 HLA-DRA, TNF, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Graft-versus-host disease 41 7 22.00 11.90 6.7e-10 3 HLA-DRA, TNF, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Intestinal immune network for IgA production 48 6 16.10 9.29 7.0e-08 3 HLA-DRA, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Systemic lupus erythematosus 139 9 8.35 7.72 9.6e-08 3 HLA-DRA, TNF, C2, C4A, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Autoimmune thyroid disease 52 6 14.90 8.87 1.2e-07 3 HLA-DRA, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1
Viral myocarditis 71 6 10.90 7.40 1.1e-06 3 HLA-DRA, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPB1, HLA-DQA1

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- SLE_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="SF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 8 significant SCOP domains are shown below:

name nAnno nOverlap fc zscore pvalue distance members
MHC antigen-recognition domain 40 6 24.9 11.80 3.5e-09 235 HLA-DOB, HLA-DPB1, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DRA
Heat shock protein 70kD (HSP70), peptide-binding domain 12 3 41.5 10.90 5.8e-07 83 HSPA1A, HSPA1B, HSPA1L
Heat shock protein 70kD (HSP70), C-terminal subdomain 12 3 41.5 10.90 5.8e-07 69 HSPA1A, HSPA1B, HSPA1L
vWA-like 95 6 10.5 7.22 1.6e-06 187 CFB, C2, ITGAD, ITGAM, ITGAX, XRCC6
Integrin alpha N-terminal domain 23 3 21.6 7.71 9.9e-06 127 ITGAD, ITGAM, ITGAX
Integrin domains 25 3 19.9 7.37 1.4e-05 75 ITGAD, ITGAM, ITGAX
TNF-like 53 4 12.5 6.54 1.6e-05 91 LTB, TNFSF4, TNF, LTA
Nucleotidylyl transferase 29 3 17.2 6.78 2.6e-05 166 VARS, VARS2, NMNAT2

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="GOMF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap fc zscore pvalue distance members
MHC class II receptor activity 12 5 72.30 18.80 2.7e-11 5 HLA-DQB1, HLA-DRA, HLA-DQA1, HLA-DOB, HLA-DQA2
tumor necrosis factor receptor binding 30 5 28.90 11.70 1.6e-08 6 STAT1, TNF, LTA, LTB, TNFSF4
peptide antigen binding 36 4 19.30 8.36 1.8e-06 3 HLA-DQB1, HLA-DRA, HLA-DPB1, HLA-DQA1
ATP-dependent RNA helicase activity 63 3 8.26 4.40 4.7e-04 9 DHX30, SKIV2L, DDX39B
receptor binding 321 7 3.78 3.84 5.2e-04 3 CSNK2B, HSPA1A, HSPA1B, LTA, BLK, LTB, TNFSF4
protease binding 85 3 6.12 3.61 1.5e-03 4 TNFAIP3, PYCARD, TNF
cytokine activity 172 4 4.04 3.05 3.1e-03 4 TNF, LTA, LTB, TNFSF4
actin filament binding 113 3 4.61 2.93 4.1e-03 4 AIF1, ARPC5, FLNC
SH3 domain binding 116 3 4.49 2.87 4.5e-03 4 PTTG1, MAPK15, PTPN22
ATP binding 1454 15 1.79 2.42 8.2e-03 5 UBE2L3, HSPA1L, HSPA1A, HSPA1B, DHX30, BLK, XRCC6, SKIV2L, MAPK15, VARS, VARS2, DDX39B, NADSYN1, NMNAT2, TYK2

5.3.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=50
SLE_net <- xPrioritiserSubnet(pNode=pSLE, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- SLE_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Immune System 912 30 4.37 9.36 3.7e-14 4 IL2, CD80, HRAS, STAT1, ICAM1, FCGR2B, CSK, CDKN1B, C2, NCF2, SOCS1, IRAK1, TYK2, CD44, SKP1, PSMB2, RASGRP3, ICAM3, IRF7, FCGR3A, RASGRP1, PSMB8, PSMB9, UBA52, IRF5, BLK, AGER, IRF8, SQSTM1, ICAM4
Beta2 integrin cell surface interactions 29 7 32.10 14.60 2.5e-11 1 ICAM1, ITGAX, ITGAM, ICAM3, FCGR2A, ITGAD, ICAM4
Leishmania infection 72 9 16.60 11.60 1.1e-10 3 IL10, STAT1, TNF, IL12A, NCF2, ITGAM, IRAK1, FCGR2A, FCGR3A
Cytokine Signaling in Immune system 268 15 7.44 9.32 8.0e-11 4 IL2, HRAS, STAT1, ICAM1, SOCS1, IRAK1, TYK2, CD44, SKP1, IRF7, PSMB8, UBA52, IRF5, IRF8, SQSTM1
IL27-mediated signaling events 26 6 30.70 13.20 5.8e-10 1 IL2, STAT1, STAT4, TNF, IL12A, TYK2
Adaptive Immune System 524 19 4.82 7.85 7.2e-10 4 CD80, HRAS, ICAM1, FCGR2B, CSK, CDKN1B, NCF2, SOCS1, SKP1, PSMB2, RASGRP3, ICAM3, FCGR3A, RASGRP1, PSMB8, PSMB9, UBA52, BLK, ICAM4
Downstream Signaling Events Of B Cell Receptor (BCR) 94 9 12.70 9.95 1.6e-09 4 HRAS, CDKN1B, SKP1, PSMB2, RASGRP3, RASGRP1, PSMB8, PSMB9, UBA52
Signaling by the B Cell Receptor (BCR) 123 10 10.80 9.53 1.5e-09 4 HRAS, CDKN1B, SKP1, PSMB2, RASGRP3, RASGRP1, PSMB8, PSMB9, UBA52, BLK
Interferon gamma signaling 62 7 15.00 9.63 1.6e-08 4 STAT1, ICAM1, SOCS1, CD44, IRF7, IRF5, IRF8
Interferon alpha/beta signaling 64 7 14.50 9.46 2.1e-08 4 STAT1, SOCS1, TYK2, IRF7, PSMB8, IRF5, IRF8

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 5 significant druggable categories are shown below:

name nAnno nOverlap fc zscore pvalue distance members
External side of plasma membrane 189 7 6.02 5.51 1.5e-05 NA TNFRSF4, TNF, ITGAX, ICAM1, CD80, FCGR3A, CD44
Drug resistance 349 7 3.26 3.42 1.1e-03 NA TNF, ICAM1, SOCS1, IL10, XBP1, GPX3, NCF2
Tumor suppressor 716 11 2.50 3.35 8.3e-04 NA TNF, CDKN1B, HRAS, UBA52, IL12A, ERN1, PSMB2, PSMB8, PSMB9, ETS1, CDC37
Cell surface 467 8 2.78 3.15 1.7e-03 NA TNFSF4, TNFRSF4, TNF, ITGAX, ICAM1, CD80, FCGR3A, CD44
Tyrosine kinase 151 3 3.23 2.18 1.3e-02 NA CSK, TYK2, BLK

5.4 Dupuytren’s disease (DD)

Import DD-associated GWAS lead SNPs

data.file <- "http://galahad.well.ox.ac.uk/bigdata/Dupuytren.txt"
DD <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)

The 13 DD-associated SNPs are:

SNPs Pvalue
rs16879765 6e-39
rs6519955 3e-33
rs611744 8e-15
rs11672517 7e-14
rs2912522 2e-13
rs8124695 8e-10
rs7524102 3e-09
rs10809650 6e-09
rs4730775 3e-08
rs2179367 3e-07
rs4789939 6e-07
rs4932194 8e-07
rs12032381 6e-06

5.4.1 Gene-level prioritisation

pDD <- xPrioritiserSNPs(data=DD, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)

The results are stored in the data frame pDD$priority, which can be saved into a file DD.genes_priority.txt:

DD_genes <- pDD$priority
write.table(DD_genes, file="DD.genes_priority.txt", sep="\t", row.names=F)

The top 20 genes are listed below:

name seed weight priority rank
PRR34 1 11.96220 0.11660 1
SFRP4 1 12.43433 0.11429 2
NME8 1 10.15061 0.09302 3
WNT7B 1 8.90219 0.08214 4
EIF3E 1 4.89795 0.04480 5
DUXA 1 4.42695 0.04138 6
RSPO2 1 4.50962 0.04136 7
ZIM3 1 3.52149 0.03572 8
USP29 1 3.01678 0.03165 9
BPIFC 0 0.00000 0.02915 10
PPARA 1 2.79592 0.02560 11
TIMP2 1 2.23830 0.02052 12
AURKC 1 2.04431 0.01872 13
WNT2 1 1.61094 0.01569 14
ZC3H12D 1 1.04283 0.00953 15
NUP43 1 0.96004 0.00878 16
ZBTB40 1 0.91339 0.00849 17
PCMT1 1 0.91664 0.00838 18
ZNF460 1 0.82780 0.00758 19
TAB2 1 0.80465 0.00737 20

These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.

mp_DD <- xPrioritiserManhattan(pDD, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_DD)

5.4.2 Pathway-level prioritisation

Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).

eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `DD.pathways_priority.txt`
DD_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(DD_pathways), DD_pathways)
write.table(output, file="DD.pathways_priority.txt", sep="\t", row.names=F)

The top 10 prioritised pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Basal cell carcinoma 55 29 78.40 47.4 0.0e+00 3 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
Melanogenesis 101 29 42.70 34.7 0.0e+00 3 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
Wnt signaling pathway 151 32 31.50 31.1 0.0e+00 3 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK2, DKK1
Class B/2 (Secretin family receptors) 87 27 46.10 34.8 0.0e+00 4 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3
Genes related to Wnt-mediated signal transduction 89 26 43.40 33.1 0.0e+00 1 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT11, WNT2B, FZD4, WNT9A, FZD6, FZD7, FZD2, FZD8, WNT10A, WNT3A, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK1
Wnt signaling network 28 17 90.20 38.9 0.0e+00 1 WNT1, FZD1, WNT2, WNT3, WNT5A, WNT7A, WNT7B, FZD4, FZD6, FZD7, FZD2, FZD8, FZD9, WNT3A, FZD10, FZD5, DKK1
Hedgehog signaling pathway 56 19 50.40 30.5 0.0e+00 3 WNT1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B
Pathways in cancer 325 29 13.30 18.5 0.0e+00 3 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B
GPCR ligand binding 400 27 10.00 15.2 0.0e+00 4 WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3
Genes encoding secreted soluble factors 344 21 9.07 12.6 1.4e-16 1 WNT1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B, SFRP4, MEGF6

This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.

df <- DD_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()

Similarly, look at enrichments for top genes from different aspects:

  • SCOP domain superfamilies
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="SF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 6 significant SCOP domains are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Frizzled cysteine-rich domain 20 11 106.00 34.00 0.0e+00 24 SFRP4, FZD1, FZD4, FZD6, FZD7, FZD8, FZD10, FZD9, FZD3, FZD5, FZD2
Thioredoxin-like 127 12 18.20 14.10 6.8e-14 179 TXNL1, TXNRD1, TXNRD3, PRDX2, PRDX1, NME8, PRDX3, TXNDC2, PRDX5, TXNDC8, TXN2, PRDX4
FAD/NAD(P)-binding domain 54 3 10.70 5.17 1.7e-04 168 TXNRD1, TXNRD3, TXNRD2
Growth factor receptor domain 127 4 6.08 4.15 5.0e-04 303 RSPO4, MEGF6, RSPO3, RSPO2
Cysteine proteinases 149 3 3.89 2.56 7.4e-03 246 USP5, USP29, USP36
Homeodomain-like 294 4 2.63 2.04 1.8e-02 51 DUXA, ARGFX, TPRX1, LEUTX

  • GO molecular function
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="GOMF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

The top 10 significant GO molecular function terms are shown below:

name nAnno nOverlap fc zscore pvalue distance members
frizzled binding 37 23 102.00 48.10 0.0e+00 5 WNT5A, ZNRF3, WNT3A, WNT1, WNT3, RSPO3, WNT7A, WNT5B, FZD1, FZD7, WNT4, WNT11, WNT2, WNT8A, WNT10B, WNT6, WNT7B, WNT8B, WNT10A, WNT2B, WNT9A, WNT9B, WNT16
Wnt-activated receptor activity 21 11 85.70 30.50 0.0e+00 5 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10
Wnt-protein binding 28 11 64.30 26.30 4.0e-20 3 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10
receptor agonist activity 15 9 98.20 29.50 1.3e-19 4 WNT5A, WNT3A, WNT1, WNT3, WNT7A, WNT4, WNT2, WNT8A, WNT10B
protein disulfide oxidoreductase activity 23 6 42.70 15.70 5.6e-11 5 TXNRD1, TXNRD3, TXN2, TXNDC2, TXNDC8, TXNL1
PDZ domain binding 86 6 11.40 7.60 9.1e-07 4 FZD4, FZD1, FZD8, FZD2, FZD7, FZD3
G-protein coupled receptor activity 643 12 3.05 4.18 1.4e-04 5 FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10, LGR6
G-protein coupled receptor binding 45 3 10.90 5.22 1.6e-04 4 RSPO3, RSPO2, RSPO4
flavin adenine dinucleotide binding 62 3 7.92 4.28 5.5e-04 4 TXNRD1, TXNRD3, TXNRD2
microtubule motor activity 68 3 7.22 4.03 7.9e-04 8 DNAH11, DNAH5, DNAI2

5.4.3 Network-level prioritisation

# find maximum-scoring gene network with the desired node number=20
DD_net <- xPrioritiserSubnet(pNode=pDD, priority.quantite=0.1, subnet.size=21, RData.location=RData.location)

The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

g <- DD_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Circos plot of the gene network is shown below:

# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)

Look at enrichments for genes in the network:

  • Pathways
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 10 significant pathways are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Basal cell carcinoma 55 9 75.50 25.80 3.5e-18 3 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Class B/2 (Secretin family receptors) 87 9 47.70 20.40 4.7e-16 4 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Genes related to Wnt-mediated signal transduction 89 9 46.70 20.20 5.9e-16 1 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, SFRP4
Melanogenesis 101 9 41.10 18.90 2.2e-15 3 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Wnt signaling pathway 151 10 30.60 17.10 1.8e-15 3 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10, SFRP4
Wnt signaling network 28 6 98.90 24.20 7.4e-14 1 WNT2, WNT7B, FZD4, FZD7, FZD8, FZD10
Pathways in cancer 325 11 15.60 12.50 2.2e-13 3 EGFR, WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10, MMP9
Hedgehog signaling pathway 56 5 41.20 14.10 1.3e-09 3 WNT2, WNT6, WNT7B, WNT11, WNT10A
GPCR ligand binding 400 9 10.40 8.95 2.2e-09 4 WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10
Signaling by GPCR 906 10 5.09 6.06 4.7e-07 4 EGFR, WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10

  • Druggable categories
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)

Enrichment results for the top 3 significant druggable categories are shown below:

name nAnno nOverlap fc zscore pvalue distance members
Cell surface 467 6 5.67 5.0 2.9e-05 NA SFRP4, FZD10, WNT6, FZD4, RSPO2, TIMP2
G protein coupled receptor 855 4 2.06 1.6 3.4e-02 NA FZD10, FZD4, FZD7, FZD8
NA NA NA NA NA NA NA NA

6 Ontology annotations & TPN nominations

In this section, we first describe annotation predictors using ontologies and target genes nominated by ULTRA-DD TPN members, and then integrate them as an additional support for disease-specific genetic prioritisation by Pi.

6.1 Annotation predictors

Using ontologies to annotate genes is one of the most effective ways to reuse existing knowledge. Also it is a scalable way to capture a particular knowledge sphere in an systematic way. An ontology is like a dictionary, containing vocabularies (called terms) and their relations. These terms and relations are understandable by humans, and readable by computers. According to how to organise relationships between terms, there are types of ontologies:

  1. structured ontology: terms are organised in a tree-like structure, such as Gene Ontology (Ashburner et al. 2000), Disease Ontology (Schriml et al. 2012), and Phenotype Ontologies in human and mouse (Köhler et al. 2013; C. L. Smith and Eppig 2009);
  2. non-structured ontology: terms are not organised as tree but simply listed as keywords, such as a collection of pathways, and gene druggable categories.

When integrating ontologies of different knowledge spheres, the use is limited by the heterogeneous nature: different terms/keywords represent the same or similar knowledge. For this reason, they are better used in a broader context to capture a general category of knowledge, eg a group of diseases instead of a specific disease. We consider the context broadly relevant to immune-related diseases. We choose 5 annotation predictors as informative carriers generalising their relatedness to immunity/inflammation diseases. For a given gene, we look at:

  1. OMIM evidence: whether it was identified in OMIM as an immune disease ontology gene;
  2. Phenotype evidence: whether it is annotated to immune-related phenotypes from Human Phenotype Ontology (HPO) and/or Mouse Phenotype Ontology (MPO);
  3. Function evidence: whether it is annotated to immune/inflammatory terms from Gene Ontology;
  4. Pathway evidence: whether it is annotated to immune-related pathways (from KEGG, BioCarta and Reactome);
  5. Druggable evidence: whether it is annotated to gene druggable categories from DGIdb.

Since the above predictors are relatively independent to each other, additive scoring scheme is used to calculate annotation scores per gene.

Import annotation predictors (pre-computed)

data.file <- "http://galahad.well.ox.ac.uk/bigdata/iAnno.txt"
iAnno <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
# gene/target info
iSymbol <- iAnno[,2]
iGenes <- iAnno[,1:3]
# predictor info (including individual scores)
iPS <- iAnno[,22:26]
# additive predictor score
PS <- apply(iPS, 1, sum)
# combine additive predictor score and individual acores
iPS <- cbind(PS=PS, iPS)

The first 5 rows of the data frame iGenes are:

Target.GeneID Target.Symbol Target.Name
1 A1BG alpha-1-B glycoprotein
2 A2M alpha-2-macroglobulin
3 A2MP1 alpha-2-macroglobulin pseudogene 1
9 NAT1 N-acetyltransferase 1 (arylamine N-acetyltransferase)
10 NAT2 N-acetyltransferase 2 (arylamine N-acetyltransferase)

Their predictor info is stored in the data frame iPS, with the column PS for additive predictor scores:

PS OMIM Phenotype Function Pathway Druggable
0 0 0 0 0 0
1 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
1 1 0 0 0 0

Distributions of annotation predictor scores

df <- iPS
p <- ggplot(df, aes(PS))
p + geom_bar(fill="deepskyblue") + scale_y_log10() + geom_text(stat="bin",binwidth=1,aes(label=..count..),vjust=0,color="blue") + ylab("Number of genes (predicted)") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + scale_x_continuous(breaks=0:max(df$PS)) + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue"))

Pairwise correlations between predictors

df <- df[df$PS>0, -1]
res <- supraHex::sDistance(t(df), metric="pearson")
diag(res) <- 1
cellnote <- signif(1-res, digits=2)
diag(cellnote) <- NA
visHeatmapAdv(1-res, Rowv=F, Colv=F, colormap="bwr", zlim=c(-1,1), key=F, cexRow=1, cexCol=1, cellnote=cellnote, notecex=0.8, notecol="black", lhei=c(1,5), lwid=c(1,5))

6.2 TPN nominations

# TPN nomination info
iTPN <- iAnno[,9:15]
# flag whether nominated by ULTRA-DD TPN members
iNominated <- iAnno[,9]
# info provided by SGC Oxford
iSGC <- iAnno[,4:8]
# iSGC restricted to nominated targets
df <- iSGC[iNominated>0, ]
df$SGC_PI <- !(df$SGC_PI %in% c("Unassigned","Unassigned - TO",""))

Gene family analysis

Below show results of family analysis (provided by Brian and David).

The top family is integral membrane proteins (IMP), followed by histone proteins (histone lysine demethylase; KDM) and ubiquitin-related proteins. Other top families include DENN domain proteins.

Allocations to SGC PIs

As seen below, 714 genes with 490 (68%) have been allocation to specific SGC-Oxford PIs.

SGC PI allocations are conditioned on SG/ChemBio Tractability:

Generally speaking, targets with good tractability are more likely to be allocated to SGC PIs.

Distributions of annotation predictor scores for nominated genes

Distributions of annotation predictor scores are conditioned on whether genes are nominated or not:

It can be seen that nominated genes tend to receive a higher annotation scores than non-nominated genes do.

6.3 Integration for Ankylosing Spondylitis

Recall: AS genes prioritised by Pi are stored in the data frame AS_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(AS_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine AS_genes iPS iGenes iNominated
AS_combined <- cbind(AS_genes[,c(1,4)], genes_ap)

The top 20 AS-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
CAST 0.07979 1 0 0 0 0 1 0
ERAP2 0.03431 0 0 0 0 0 0 0
ERAP1 0.03099 2 0 0 1 1 0 0
B3GNT2 0.01958 0 0 0 0 0 0 0
IL23R 0.01719 3 1 1 1 0 0 0
LNPEP 0.01603 1 0 0 0 1 0 0
DDX39B 0.01211 1 0 0 0 0 1 0
IL12RB2 0.01129 1 0 1 0 0 0 0
HLA-C 0.01127 3 1 0 1 1 0 0
ETS2 0.01094 0 0 0 0 0 0 0
PSMG1 0.01029 0 0 0 0 0 0 0
TBKBP1 0.01010 0 0 0 0 0 0 0
NFKBIL1 0.00979 1 1 0 0 0 0 0
ATP6V1G2 0.00969 0 0 0 0 0 0 0
KIF21B 0.00950 0 0 0 0 0 0 0
GPR65 0.00914 1 0 0 1 0 0 1
C1orf106 0.00806 0 0 0 0 0 0 0
IL6R 0.00705 5 1 1 1 1 1 0
TNFRSF1A 0.00685 4 1 1 1 0 1 1
LTA 0.00642 3 1 0 0 1 1 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving AS-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(AS_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(AS_combined$priority,1) # genetic priority at the top
p <- ggplot(AS_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("AS-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 48 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 124 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone. The origin of the wording a2maid is explained here.1
  • Top-left: 583 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to AS (maybe other immune diseases).

The 48 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
IL6R interleukin 6 receptor 0.00705 5 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00685 4 1
CARD9 caspase recruitment domain family, member 9 0.00573 4 0
TNF tumor necrosis factor 0.00544 4 0
TYK2 tyrosine kinase 2 0.00373 4 0
IFIH1 interferon induced with helicase C domain 1 0.00368 4 0
CD27 CD27 molecule 0.00312 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00262 4 0
IL12B interleukin 12B 0.00240 5 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00238 4 0
IL2RA interleukin 2 receptor, alpha 0.00236 5 1
ICAM1 intercellular adhesion molecule 1 0.00176 4 0
NCF4 neutrophil cytosolic factor 4, 40kDa 0.00156 4 0
AIRE autoimmune regulator 0.00091 4 0
ADAR adenosine deaminase, RNA-specific 0.00090 4 1
HLA-B major histocompatibility complex, class I, B 0.00076 4 0
HLA-A major histocompatibility complex, class I, A 0.00068 5 0
B2M beta-2-microglobulin 0.00062 5 0
NOTCH1 notch 1 0.00061 4 0
IFNG interferon, gamma 0.00022 4 0
STAT4 signal transducer and activator of transcription 4 0.00020 4 0
IL12A interleukin 12A 0.00020 4 0
JAK2 Janus kinase 2 0.00018 5 0
IL1B interleukin 1, beta 0.00015 4 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00015 5 0
IL2 interleukin 2 0.00014 4 0
IL10 interleukin 10 0.00014 5 0
CD3E CD3e molecule, epsilon (CD3-TCR complex) 0.00014 5 0
JAK3 Janus kinase 3 0.00013 4 0
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 0.00013 4 0
MALT1 MALT1 paracaspase 0.00012 4 0
PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) 0.00012 5 0
IL1RN interleukin 1 receptor antagonist 0.00011 5 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00011 5 0
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha 0.00011 5 0
IL6 interleukin 6 0.00011 5 0
CBL Cbl proto-oncogene, E3 ubiquitin protein ligase 0.00010 4 0
PIK3CA phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha 0.00010 4 1
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00010 4 0
BCL10 B-cell CLL/lymphoma 10 0.00009 4 0
CD3G CD3g molecule, gamma (CD3-TCR complex) 0.00009 4 0
CD8A CD8a molecule 0.00009 4 0
IRF8 interferon regulatory factor 8 0.00009 4 0
AKT1 v-akt murine thymoma viral oncogene homolog 1 0.00008 5 0
TLR3 toll-like receptor 3 0.00008 5 0
FOS FBJ murine osteosarcoma viral oncogene homolog 0.00008 4 0
IL7R interleukin 7 receptor 0.00008 5 0
FASLG Fas ligand (TNF superfamily, member 6) 0.00008 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ERAP1 endoplasmic reticulum aminopeptidase 1 0.03099 2 0
IL23R interleukin 23 receptor 0.01719 3 0
HLA-C major histocompatibility complex, class I, C 0.01127 3 0
LTA lymphotoxin alpha 0.00642 3 0
NPEPPS aminopeptidase puromycin sensitive 0.00632 2 0
CACNA1S calcium channel, voltage-dependent, L type, alpha 1S subunit 0.00570 2 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 0.00525 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 0.00512 2 0
FCGR2B Fc fragment of IgG, low affinity IIb, receptor (CD32) 0.00425 3 0
ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) 0.00378 3 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
CAST calpastatin 0.07979 1 0
ERAP2 endoplasmic reticulum aminopeptidase 2 0.03431 0 0
B3GNT2 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 0.01958 0 0
LNPEP leucyl/cystinyl aminopeptidase 0.01603 1 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 0.01211 1 0

Label the gene network with annotation prediction

The nodes/genes in AS-specific gene network are color-coded by annotation additive predictor scores.

g <- AS_net
evi <- AS_combined[,3]
names(evi) <- rownames(AS_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- AS_net
evi <- AS_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

6.4 Integration for Spondyloarthritis

Recall: SA genes prioritised by Pi are stored in the data frame SA_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(SA_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SA_genes iPS iGenes iNominated
SA_combined <- cbind(SA_genes[,c(1,4)], genes_ap)

The top 20 SA-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
HLA-C 0.04467 3 1 0 1 1 0 0
MUC22 0.02712 0 0 0 0 0 0 0
CAST 0.02063 1 0 0 0 0 1 0
CCHCR1 0.01795 0 0 0 0 0 0 0
DDX39B 0.01717 1 0 0 0 0 1 0
MUC21 0.01523 0 0 0 0 0 0 0
ATF6B 0.01477 0 0 0 0 0 0 0
NFKBIL1 0.01391 1 1 0 0 0 0 0
ATP6V1G2 0.01376 0 0 0 0 0 0 0
POU5F1 0.01277 1 0 0 0 0 1 0
ERAP2 0.00930 0 0 0 0 0 0 0
IER3 0.00904 0 0 0 0 0 0 0
LTA 0.00883 3 1 0 0 1 1 0
ERAP1 0.00838 2 0 0 1 1 0 0
TRAF3IP2 0.00793 1 0 1 0 0 0 0
TNF 0.00770 4 1 0 1 1 1 0
MDC1 0.00768 0 0 0 0 0 0 0
LTB 0.00752 0 0 0 0 0 0 0
TNIP1 0.00745 0 0 0 0 0 0 0
ANXA6 0.00713 0 0 0 0 0 0 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving SA-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(SA_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SA_combined$priority,1) # genetic priority at the top
p <- ggplot(SA_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("Spondyloarthritis-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 48 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 105 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.
  • Top-left: 602 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to Spondyloarthritis (maybe other immune diseases).

The 48 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
TNF tumor necrosis factor 0.00770 4 0
TYK2 tyrosine kinase 2 0.00518 4 0
IL12B interleukin 12B 0.00399 5 0
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0.00312 4 0
ICAM1 intercellular adhesion molecule 1 0.00312 4 0
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0.00245 4 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00190 4 1
IL6R interleukin 6 receptor 0.00187 5 0
NFKBIA nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha 0.00169 5 0
DDX58 DEAD (Asp-Glu-Ala-Asp) box polypeptide 58 0.00167 4 0
CARD9 caspase recruitment domain family, member 9 0.00149 4 0
CFB complement factor B 0.00088 5 0
IFIH1 interferon induced with helicase C domain 1 0.00088 4 0
CD27 CD27 molecule 0.00081 4 0
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 0.00075 4 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00071 4 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00069 4 0
IL2RA interleukin 2 receptor, alpha 0.00066 5 1
ADAR adenosine deaminase, RNA-specific 0.00045 4 1
C4A complement component 4A (Rodgers blood group) 0.00043 4 0
NCF4 neutrophil cytosolic factor 4, 40kDa 0.00042 4 0
HLA-B major histocompatibility complex, class I, B 0.00042 4 0
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00040 4 0
HLA-A major histocompatibility complex, class I, A 0.00038 5 0
B2M beta-2-microglobulin 0.00035 5 0
AIRE autoimmune regulator 0.00026 4 0
MAPK1 mitogen-activated protein kinase 1 0.00025 4 0
PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 0.00023 5 0
CD3E CD3e molecule, epsilon (CD3-TCR complex) 0.00022 5 0
IRF8 interferon regulatory factor 8 0.00022 4 0
CIITA class II, major histocompatibility complex, transactivator 0.00021 4 1
CD8A CD8a molecule 0.00019 4 0
IRF7 interferon regulatory factor 7 0.00019 4 0
CD3G CD3g molecule, gamma (CD3-TCR complex) 0.00019 4 0
CD3D CD3d molecule, delta (CD3-TCR complex) 0.00018 4 0
IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma 0.00018 5 0
PTPN11 protein tyrosine phosphatase, non-receptor type 11 0.00018 4 0
ISG15 ISG15 ubiquitin-like modifier 0.00018 4 0
NOTCH1 notch 1 0.00017 4 0
PML promyelocytic leukemia 0.00016 5 0
STAT5B signal transducer and activator of transcription 5B 0.00015 4 0
CHUK conserved helix-loop-helix ubiquitous kinase 0.00013 4 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00013 5 0
IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta 0.00011 5 0
IFNG interferon, gamma 0.00011 4 0
IL12A interleukin 12A 0.00011 4 0
MS4A1 membrane-spanning 4-domains, subfamily A, member 1 0.00011 4 0
CD40 CD40 molecule, TNF receptor superfamily member 5 0.00011 5 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
HLA-C major histocompatibility complex, class I, C 0.04467 3 0
LTA lymphotoxin alpha 0.00883 3 0
ERAP1 endoplasmic reticulum aminopeptidase 1 0.00838 2 0
ICAM3 intercellular adhesion molecule 3 0.00460 2 0
IL23R interleukin 23 receptor 0.00450 3 0
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 0.00416 2 1
FYN FYN proto-oncogene, Src family tyrosine kinase 0.00347 3 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 0.00315 2 0
HLA-E major histocompatibility complex, class I, E 0.00269 2 0
NOS2 nitric oxide synthase 2, inducible 0.00230 3 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
MUC22 mucin 22 0.02712 0 0
CAST calpastatin 0.02063 1 0
CCHCR1 coiled-coil alpha-helical rod protein 1 0.01795 0 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 0.01717 1 0
MUC21 mucin 21, cell surface associated 0.01523 0 0

Label the gene network with annotation prediction

The nodes/genes in Spondyloarthritis-specific gene network are color-coded by annotation additive predictor scores.

g <- SA_net
evi <- SA_combined[,3]
names(evi) <- rownames(SA_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- SA_net
evi <- SA_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

6.5 Integration for Systemic Lupus Erythematosus

Recall: SLE genes prioritised by Pi are stored in the data frame SLE_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).

# find genes with annotation predictors  
ind <- match(rownames(SLE_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SLE_genes iPS iGenes iNominated
SLE_combined <- cbind(SLE_genes[,c(1,4)], genes_ap)

The top 20 SLE-specific genes along with annotation predictors and TPN nominations are listed below:

name priority PS OMIM Phenotype Function Pathway Druggable Nominated
ATF6B 0.02673 0 0 0 0 0 0 0
CLIC1 0.02307 0 0 0 0 0 0 0
SKIV2L 0.02112 0 0 0 0 0 0 0
FKBPL 0.01897 0 0 0 0 0 0 0
BAG6 0.01830 0 0 0 0 0 0 0
CFB 0.01567 5 1 1 1 1 1 0
PRRC2A 0.01492 0 0 0 0 0 0 0
AIF1 0.01464 1 0 0 1 0 0 0
NELFE 0.01448 0 0 0 0 0 0 0
HSPA1L 0.01440 0 0 0 0 0 0 0
VARS 0.01438 1 0 0 0 0 1 1
LSM2 0.01438 0 0 0 0 0 0 0
HSPA1A 0.01422 0 0 0 0 0 0 0
NEU1 0.01420 2 0 1 0 0 1 0
CSNK2B 0.01394 0 0 0 0 0 0 0
HSPA1B 0.01385 0 0 0 0 0 0 0
EHMT2 0.01350 1 0 0 0 0 1 0
PPT2 0.01347 0 0 0 0 0 0 0
DDAH2 0.01347 0 0 0 0 0 0 0
LTA 0.01320 3 1 0 0 1 1 0

Comparing genetic priority and annotation prediction

This violin plot displays density (the violin shape) of genes receiving SLE-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).

hline <- quantile(SLE_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SLE_combined$priority,1) # genetic priority at the top
p <- ggplot(SLE_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("SLE-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")

A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:

  • Top-right: 43 easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.
  • Top-middle: 98 a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.
  • Top-left: 614 novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.
  • Bottom: targets unlikely related to SLE (maybe other immune diseases).

The 43 easy targets/genes are:

Target.Symbol Target.Name priority PS Nominated
CFB complement factor B 0.01567 5 0
STAT4 signal transducer and activator of transcription 4 0.01047 4 0
TNF tumor necrosis factor 0.00791 4 0
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0.00467 4 0
C4A complement component 4A (Rodgers blood group) 0.00435 4 0
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0.00368 4 0
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 0.00224 4 2
TYK2 tyrosine kinase 2 0.00133 4 0
IRF8 interferon regulatory factor 8 0.00106 4 0
IFIH1 interferon induced with helicase C domain 1 0.00097 4 0
IRF7 interferon regulatory factor 7 0.00086 4 0
CDKN1B cyclin-dependent kinase inhibitor 1B (p27, Kip1) 0.00076 4 0
MS4A1 membrane-spanning 4-domains, subfamily A, member 1 0.00069 4 0
PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 0.00067 5 0
HRAS Harvey rat sarcoma viral oncogene homolog 0.00066 4 0
IL10 interleukin 10 0.00059 5 0
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 0.00051 4 0
PNP purine nucleoside phosphorylase 0.00050 4 0
IL12A interleukin 12A 0.00044 4 0
ICAM1 intercellular adhesion molecule 1 0.00042 4 0
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 0.00040 4 0
MAPK1 mitogen-activated protein kinase 1 0.00029 4 0
CIITA class II, major histocompatibility complex, transactivator 0.00027 4 1
CHUK conserved helix-loop-helix ubiquitous kinase 0.00024 4 0
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0.00019 4 1
CD4 CD4 molecule 0.00018 5 0
BCL2 B-cell CLL/lymphoma 2 0.00018 5 0
C3 complement component 3 0.00016 5 0
MASP2 mannan-binding lectin serine peptidase 2 0.00016 5 0
CARD9 caspase recruitment domain family, member 9 0.00016 4 0
EGFR epidermal growth factor receptor 0.00015 4 0
CYBB cytochrome b-245, beta polypeptide 0.00015 5 0
NFKB2 nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) 0.00014 5 0
ITGB2 integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) 0.00014 5 0
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0.00014 4 0
C1QA complement component 1, q subcomponent, A chain 0.00013 4 0
IKBKB inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta 0.00013 5 0
PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) 0.00013 5 0
IKBKG inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma 0.00012 5 0
PSTPIP1 proline-serine-threonine phosphatase interacting protein 1 0.00011 4 0
MEFV Mediterranean fever 0.00011 4 0
NLRP3 NLR family, pyrin domain containing 3 0.00010 4 1
MYD88 myeloid differentiation primary response 88 0.00010 4 0

The top 10 a2maid targets/genes are:

Target.Symbol Target.Name priority PS Nominated
NEU1 sialidase 1 (lysosomal sialidase) 0.01420 2 0
LTA lymphotoxin alpha 0.01320 3 0
CYP21A2 cytochrome P450, family 21, subfamily A, polypeptide 2 0.01257 2 0
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 0.01162 3 1
AGER advanced glycosylation end product-specific receptor 0.01143 2 0
PYCARD PYD and CARD domain containing 0.01075 2 0
HLA-DPB1 major histocompatibility complex, class II, DP beta 1 0.00967 2 0
IRF5 interferon regulatory factor 5 0.00945 3 1
HLA-DRA major histocompatibility complex, class II, DR alpha 0.00938 2 0
PSMB9 proteasome (prosome, macropain) subunit, beta type, 9 0.00909 3 0

The top 5 novel targets/genes are:

Target.Symbol Target.Name priority PS Nominated
ATF6B activating transcription factor 6 beta 0.02673 0 0
CLIC1 chloride intracellular channel 1 0.02307 0 0
SKIV2L superkiller viralicidic activity 2-like (S. cerevisiae) 0.02112 0 0
FKBPL FK506 binding protein like 0.01897 0 0
BAG6 BCL2-associated athanogene 6 0.01830 0 0

Label the gene network with annotation prediction

The nodes/genes in SLE-specific gene network are color-coded by annotation additive predictor scores.

g <- SLE_net
evi <- SLE_combined[,3]
names(evi) <- rownames(SLE_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.

g <- SLE_net
evi <- SLE_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)

7 Cross-disease comparisons

7.1 Gene-level comparisons

Recall: disease-centric genetic prioritisation for genes by Pi are stored respectively in three data frames, that is, AS_genes for Ankylosing Spondylitis, SA_genes for Spondyloarthritis and SLE_genes for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.

# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, AS_genes$name)
df_AS <- AS_genes[ind,]
# Spondyloarthritis (SA)
ind <- match(iSymbol, SA_genes$name)
df_SA <- SA_genes[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, SLE_genes$name)
df_SLE <- SLE_genes[ind,]

# Combine into a data frame 'df_rank' and `df_priority`
## rank
df_rank <- cbind(AS=df_AS$rank, SA=df_SA$rank, SLE=df_SLE$rank)
## priority
df_priority <- cbind(AS=df_AS$priority, SA=df_SA$priority, SLE=df_SLE$priority)
rownames(df_rank) <- rownames(df_priority) <- iSymbol

Venn diagram of the top 200 genes across diseases

ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.15)
grid.draw(vp)

The 18 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
UBE2L3 ubiquitin-conjugating enzyme E2L 3 1 1
TNF tumor necrosis factor 4 0
TYK2 tyrosine kinase 2 4 0
FCGR2B Fc fragment of IgG, low affinity IIb, receptor (CD32) 3 0
LTA lymphotoxin alpha 3 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 2 0
ICAM3 intercellular adhesion molecule 3 2 0
AIF1 allograft inflammatory factor 1 1 0
DDX39B DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B 1 0
LTBR lymphotoxin beta receptor (TNFR superfamily, member 3) 1 0
NCR3 natural cytotoxicity triggering receptor 3 1 0
NFKBIL1 nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 1 0
ATP6V1G2 ATPase, H+ transporting, lysosomal 13kDa, V1 subunit G2 0 0
CDC37 cell division cycle 37 0 0
LTB lymphotoxin beta (TNF superfamily, member 3) 0 0
PRRC2A proline-rich coiled-coil 2A 0 0
RAVER1 ribonucleoprotein, PTB-binding 1 0 0
RLTPR RGD motif, leucine rich repeats, tropomodulin domain and proline-rich containing 0 0

Venn diagram of the top 200 genes in any diseases vs 714 TPN nominated genes

ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
data$Prioritised <- base::Reduce(union, data[1:3])
data$Nominated <- iSymbol[which(iNominated>0)]
category.names <- c('Prioritised', 'Nominated')
vp <- venn.diagram(x=data[4:5], filename=NULL, fill=c("red","orange"), category.names=category.names, margin=0.15, cat.pos=1)
grid.draw(vp)

The 32 genes common to disease-specific genetic prioritisations and TPN nominations (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
PXK PX domain containing serine/threonine kinase 1 4
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 4 2
IL2RA interleukin 2 receptor, alpha 5 1
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 4 1
BLK BLK proto-oncogene, Src family tyrosine kinase 3 1
IRF5 interferon regulatory factor 5 3 1
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 3 1
TAP1 transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) 3 1
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 2 1
BANK1 B-cell scaffold protein with ankyrin repeats 1 1 1
DDR1 discoidin domain receptor tyrosine kinase 1 1 1
GPR65 G protein-coupled receptor 65 1 1
IL23A interleukin 23, alpha subunit p19 1 1
KAT8 K(lysine) acetyltransferase 8 1 1
SCNN1A sodium channel, non voltage gated 1 alpha subunit 1 1
SLC11A1 solute carrier family 11 (proton-coupled divalent metal ion transporter), member 1 1 1
SLC15A4 solute carrier family 15 (oligopeptide transporter), member 4 1 1
SPHK2 sphingosine kinase 2 1 1
UBE2L3 ubiquitin-conjugating enzyme E2L 3 1 1
VARS valyl-tRNA synthetase 1 1
APOBEC4 apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 4 (putative) 0 1
BRWD1 bromodomain and WD repeat domain containing 1 0 1
CCDC101 coiled-coil domain containing 101 0 1
MBTPS2 membrane-bound transcription factor peptidase, site 2 0 1
ORAI3 ORAI calcium release-activated calcium modulator 3 0 1
PARP11 poly (ADP-ribose) polymerase family, member 11 0 1
RAB5A RAB5A, member RAS oncogene family 0 1
RIOK2 RIO kinase 2 0 1
SLC9A8 solute carrier family 9, subfamily A (NHE8, cation proton antiporter 8), member 8 0 1
SMPD2 sphingomyelin phosphodiesterase 2, neutral membrane (neutral sphingomyelinase) 0 1
TET3 tet methylcytosine dioxygenase 3 0 1
VARS2 valyl-tRNA synthetase 2, mitochondrial 0 1

7.2 Pathway-level comparisons

Recall: disease-centric genetic prioritisation for pathways by Pi are stored respectively in three data frames, that is, AS_pathways for Ankylosing Spondylitis, SA_pathways for Spondyloarthritis and SLE_pathways for Systemic Lupus Erythematosus as exemplars.

Load all pathways into a data frame iPathway:

org.Hs.egMsigdbC2CPall <- xRDataLoader(RData.customised='org.Hs.egMsigdbC2CPall', RData.location=RData.location)
iPathway <- org.Hs.egMsigdbC2CPall$set_info[, c(1,2)]
# Ankylosing Spondylitis (AS)
ind <- match(rownames(iPathway), rownames(AS_pathways))
df_AS <- AS_pathways[ind,]
# Spondyloarthritis (SA)
ind <- match(rownames(iPathway), rownames(SA_pathways))
df_SA <- SA_pathways[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(rownames(iPathway), rownames(SLE_pathways))
df_SLE <- SLE_pathways[ind,]

# Combine into a data frame 'df_adjp'
## adjusted p-values
df_adjp <- cbind(AS=df_AS$adjp, SA=df_SA$adjp, SLE=df_SLE$adjp)
rownames(df_adjp) <- iPathway$name

This barplot displays prioritised pathways for three diseases, in which horizontal lines are used to indicate three groups of pathways prioritised:

  • Top panel: pathways common to all diseases, including IL23-mediated signaling events, IL27-mediated signaling events.
  • Middle panel: pathways common to any two diseases.
  • Bottom panel: pathways unique to individual diseases.

7.3 Network-level comparisons

Recall: the gene network identified by Pi are stored respectively in three data frames, that is, AS_net for Ankylosing Spondylitis, SA_net for Spondyloarthritis and SLE_net for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.

# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, V(AS_net)$name)
df_AS <- V(AS_net)$name[ind]
# Spondyloarthritis (SA)
ind <- match(iSymbol, V(SA_net)$name)
df_SA <- V(SA_net)$name[ind]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, V(SLE_net)$name)
df_SLE <- V(SLE_net)$name[ind]

# Combine into a data frame 'df_net'
df_net <- cbind(AS=df_AS, SA=df_SA, SLE=df_SLE)
rownames(df_net) <- iSymbol

Venn diagram of network genes across diseases

data <- list()
data$AS <- iSymbol[!is.na(df_net[,1])]
data$SA <- iSymbol[!is.na(df_net[,2])]
data$SLE <- iSymbol[!is.na(df_net[,3])]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.2)
grid.draw(vp)

The common part of the gene network shared by AS, SA and SLE

# identify common parts of SLE-network and AS-network. 
net_AS_SA_SLE <- graph.intersection(AS_net, SA_net, SLE_net, keep.all.vertices=F)

# append genes with info about annotation predictors  
g <- net_AS_SA_SLE
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]

The 7 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:

Target.Symbol Target.Name Annotated Nominated
ICAM1 intercellular adhesion molecule 1 4 0
TNF tumor necrosis factor 4 0
TYK2 tyrosine kinase 2 4 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
CDC37 cell division cycle 37 0 0

The network nodes/genes shared by three diseases are color-coded by annotation additive predictor scores.

# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

The common part of the gene network shared by AS and SA

# identify common parts of AS-network and SA-network. 
net_AS_SA <- graph.intersection(AS_net, SA_net, keep.all.vertices=F)

# append genes with info about annotation predictors  
g <- net_AS_SA
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]

The common 21 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 4 1
IL12B interleukin 12B 5 0
IL6R interleukin 6 receptor 5 0
ICAM1 intercellular adhesion molecule 1 4 0
TNF tumor necrosis factor 4 0
TYK2 tyrosine kinase 2 4 0
HLA-C major histocompatibility complex, class I, C 3 0
IL23R interleukin 23 receptor 3 0
LTA lymphotoxin alpha 3 0
NOS2 nitric oxide synthase 2, inducible 3 0
ERAP1 endoplasmic reticulum aminopeptidase 1 2 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
CAST calpastatin 1 0
IL12RB2 interleukin 12 receptor, beta 2 1 0
LNPEP leucyl/cystinyl aminopeptidase 1 0
LTBR lymphotoxin beta receptor (TNFR superfamily, member 3) 1 0
CDC37 cell division cycle 37 0 0
ERAP2 endoplasmic reticulum aminopeptidase 2 0 0
LTB lymphotoxin beta (TNF superfamily, member 3) 0 0

The network nodes/genes shared by AS and SA are color-coded by annotation additive predictor scores.

# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))

The common part of the gene network shared by SLE and SA

The common 11 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 3 1
ICAM1 intercellular adhesion molecule 1 4 0
TNF tumor necrosis factor 4 0
TYK2 tyrosine kinase 2 4 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
GPX3 glutathione peroxidase 3 1 0
SQSTM1 sequestosome 1 1 0
CDC37 cell division cycle 37 0 0
RPL34 ribosomal protein L34 0 0

The network nodes/genes shared by SLE and SA are color-coded by annotation additive predictor scores.

The common part of the gene network shared by SLE and AS

The common 11 genes along with annotation predictors and TPN nominations are listed below:

Target.Symbol Target.Name Annotated Nominated
FCGR3A Fc fragment of IgG, low affinity IIIa, receptor (CD16a) 4 0
ICAM1 intercellular adhesion molecule 1 4 0
TNF tumor necrosis factor 4 0
TYK2 tyrosine kinase 2 4 0
FCGR2B Fc fragment of IgG, low affinity IIb, receptor (CD32) 3 0
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 2 0
ICAM3 intercellular adhesion molecule 3 2 0
ICAM4 intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) 2 0
UBA52 ubiquitin A-52 residue ribosomal protein fusion product 1 2 0
SOCS1 suppressor of cytokine signaling 1 1 0
CDC37 cell division cycle 37 0 0

The network nodes/genes shared by SLE and AS are color-coded by annotation additive predictor scores.

8 References

Below is a list of references that this work stands on:

1000 Genomes Project Consortium. 2012. “An integrated map of genetic variation from 1,092 human genomes.” Nature 135 (V): 0–9. doi:10.1038/nature11632.

Ashburner, M, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A P Davis, et al. 2000. “Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.” Nat Genet 25 (1): 25–29. doi:10.1038/75556.

Cerami, E. G., B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G. D. Bader, and C. Sander. 2011. “Pathway Commons, a web resource for biological pathway data.” Nucleic Acids Research 39 (Database): D685–D690. doi:10.1093/nar/gkq1039.

Fairfax, Benjamin P, Peter Humburg, Seiko Makino, Vivek Naranbhai, Daniel Wong, Evelyn Lau, Luke Jostins, et al. 2014. “Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.” Science (New York, N.Y.) 343 (March): 1246949. doi:10.1126/science.1246949.

Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. doi:10.1186/s13073-014-0064-8.

Köhler, Sebastian, Sandra C Doelken, Christopher J Mungall, Sebastian Bauer, Helen V Firth, Isabelle Bailleul-Forestier, Graeme C M Black, et al. 2013. “The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.” Nucleic Acids Research, November, 1–9. doi:10.1093/nar/gkt1026.

Schriml, L M, C Arze, S Nadendla, Y W Chang, M Mazaitis, V Felix, G Feng, and W A Kibbe. 2012. “Disease Ontology: a backbone for disease semantic integration.” Nucleic Acids Res 40 (Database issue): D940–6. doi:10.1093/nar/gkr972.

Smith, C L, and J T Eppig. 2009. “The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis.” Wiley Interdiscip Rev Syst Biol Med 1 (3): 390–99. doi:10.1002/wsbm.44.

Szklarczyk, Damian, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-cepas, Milan Simonovic, et al. 2015. “STRING v10 : protein – protein interaction networks , integrated over the tree of life.” Nucleic Acids Res 43 (Database): D447–D452. doi:10.1093/nar/gku1003.

Welter, Danielle, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, et al. 2014. “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.” Nucleic Acids Research 42 (D1): 1001–6. doi:10.1093/nar/gkt1229.


  1. The wording a2maid comes from the mermaid (describing the maid of the sea having a fish’s tail), and instead of the maid of the sea, replacing with a2 turns to be the maid of our lab having two PhD students Anna Sanniti and Alicia Lledo Lara.